0

Flexible CSV Importer

Introduction

This plugin takes a CSV file, typically with one row per Individual record, and adds people, facts and simple relationships to a Project. It provides a dialogue to map columns to fields.

It assumes that column headings follow the conventions set out below. If a valid data reference is not found, the plugin will attempt to construct one. Simple terms like DOB, Place of Birth or Spouse's father should be interpreted correctly.

CSV Conventions

Individual records are identified with abbreviations, e.g. P for Primary, S for Spouse, PF for Primary’s Father, SM for Spouse’s Mother, etc.

Standard facts are imported by using their GEDCOM field tags, e.g. NAME, BIRT, MARR, DEAT, CENS.
Custom facts can be imported by entering the appropriate data reference tag, e.g. EVEN-CUSTOM_FACT or _ATTR-CUSTOM_FACT.
Hint: Run an Individual Query and use its Columns tab Fields pane to determine the data reference tag for any custom fact.
They are supplemented with detail tags, e.g. DATE, PLAC, ADDR, AGE, NOTE, TYPE.

Unique references (using the REFN tag) are used to identify the same individual throughout the data. The REFN field does not need to be numeric, so schemes like I1, I2, I3 or F01, F02, M01, M02 or even father, mother, child1 could be used providing that they are all unique. See the Relationships, Multiple Partnerships and Synchronization sections below for typical uses.

Whilst the plugin is reasonably intelligent, field mapping using these Conventions in the CSV file itself is more reliable, which is the approach in the Sample Plugin Demonstration Data below.

Most conventional column headings have the format RECORD.FIELD.DETAIL but there are exceptional Special Cases explained below.

Where…

RECORD identifies the Individual or Family record:

P = Primary ~ the principal person of interest in the record
PF = Primary’s father
PM = Primary’s mother
S = Spouse ~ of the primary person
SF = Spouse’s father
SM = Spouse’s mother

Family identifiers use the two individual abbreviations separated by an underscore:

P_S = the Primary person and their Spouse
PF_PM = Parent family of the Primary person
SF_SM = Parent family of the Spouse

FIELD is the GEDCOM tag or label for the fact, e.g. BIRT, Birth, DEAT, Death, CENS, Census, NAME, Name, REFN, EVEN-CUSTOM_FACT.

DETAIL is the GEDCOM tag for the detail of the field, e.g. DATE, Date, When, PLAC, Place, ADDR, Address, NOTE, Note, TYPE, Type.

For example, the primary person’s birth-place is the P.BIRT.PLAC column. PLAC and ADDR or other columns that contain commas must be enclosed in ʺstring quotesʺ to hide the commas. That formatting is performed automatically by the spreadsheet application (Excel, LibreOffice, etc.).

The marriage date of a primary person’s parents is the PF_PM.MARR.DATE column. However, it is usually better to specify this fact against a parents’ own record (row), as illustrated below, rather than in the child record.

This is a very simple example of CSV data for a wife, husband, and his parents providing their birth and marriage dates.

Simple Example CSV

Special Cases

The primary REFN column heading can omit the P prefix. None of the REFN headings has a DETAIL part.

NAME fields are assumed to be full names, i.e. Given Names Surname, unless followed by a colon and qualifier GIVEN, SURNAME or SURNAME_FIRST, e.g. P.NAME:GIVEN column. A separate P.NAME:SURNAME column is essential if any surnames contain spaces such as Van De Burg. Otherwise, the last word Burg is assumed to be the surname.

Age at the time of a family event (e.g. marriage) is indicated by a colon and the individual’s identifier. For example, P_S.MARR.Age:P and P_S.MARR.Age:S columns.

Multiple columns can be concatenated into the same field but the columns must be named in a logical manner and be unique, e.g. TEXT1, TEXT2 or NOTE-A, NOTE-B or P.NOTE-A, P.NOTE-B would be combined and imported into P.NOTE using the concatenation separator chosen in the Settings on the first tab. This also works on the address fields but it may be preferable to format the address elements as one column with any required commas in the spreadsheet / CSV file as the use of ‘newline’ may have undesirable effects in the formatting.

Relationships

The marriage or partnership relationships are mapped in the spouse S.REFN column.

A child is mapped to the parents using PF.REFN and PM.REFN columns.

Multiple Partnerships

When one person has multiple partnerships they need multiple rows so that their S.REFN column entries identify each spouse.
As long as both spouses cross-refer to each other via the S.REFN column, their family facts can be added using either row.

Individual Synchronization

The REFN values that identify the same individual throughout the data can also synchronize with the Custom ID of existing records in a Project. Use the FH Tools > Record Identifiers… option to copy Record Id to the Custom ID to sync with those numbers as the REFN values.
Alternatively, REFN integer values can be treated directly as Record Id as explained under Directions and Settings below.

Different CSV rows will use the same REFN value when multiple instances of the same type of fact such as Census or Occupation need to be entered.

Fact Synchronization

The objective here is to allow existing facts in the Project to be supplemented with details supplied by the CSV file.
For example, an existing Birth event with a Date will sync with a CSV Birth event with the same Date and gain Place, Address &/or Note details defined in the CSV columns.
The synchronizing CSV column(s) must precede the new details columns for those values to be added to the synchronized fact.
For example, a P.CENS.DATE of 6 Jun 1841 must precede P.CENS.NOTE to add notes to an existing 6 Jun 1841 Census fact.

The fact synchronization rules are:

  1. The existing and the CSV individual REFN and fact type must match, but attribute values are disregarded.
  2. Only subfield values for Date, Age, and short text such as Place and Cause must sync. Long text such as Address and Note subfields are not involved.
  3. Each such existing fact subfield with a value will sync with an equivalent CSV subfield column with the same value.
  4. An existing fact subfield that is empty will sync with an equivalent CSV subfield column with any value.
  5. An existing fact subfield with any value will sync if there is no equivalent CSV subfield column at all.

A snag with those rules arises when multiple instances of the same fact need to be added but no synchronization subfield values exist.
For example, imagine for the same individual an Occupation attribute is defined on two CSV rows with no Place or Date or Age values.
The first row creates an Occupation in the Project. The second row synchronizes with that created fact via rules 4. & 5. because there are no subfields. So both Occupation attribute values merge into one comma separated list instead of creating a new Occupation fact.
The workaround is to enter unique dummy values in a CSV subfield column such as Place values Z, ZZ, etc, or Date years 9001, 9002, etc, just for those multiple facts.
Then after running the plugin, delete all Place records with those unique values, or use a Fact Query to find and delete those unique values.

Sources

Separate source citation details can be added for each line. Column headings for citation elements are of the form:

SOUR>TITL = Source Title
SOUR.DATE = Entry Date
SOUR.QUAY = Assessment ~ 1, 2, 3, 4 or P, S, Q, U
SOUR.PAGE = Where within Source
SOUR.TEXT = Text From Source
SOUR.NOTE = Citation Note

Without a prefix as above, they are added to every new item. With a RECORD prefix, they add a whole record citation. With a RECORD.FIELD prefix they add Name & Fact citations.

If the SOUR>TITLE matches the Source record Title chosen in the Settings for every item imported then the Citation is applied to that Source.

Sample Plugin Demonstration Data

You can download this sample data in CSV format to demonstrate some functions of the Plugin. It is strongly suggested that you run this plugin in a new, empty Project.

Download the flexible_CSV_importer_sample_import demonstration CSV file.

The data would normally be collated in a spreadsheet (Excel, LibreOffice, etc.). It is important to save the output file in CSV format.

For clarity, the column headings in the sample CSV file are as follows:

REFN, P.NAME, P.SEX, P.CHR.DATE, P.CHR.PLAC, P.CHR.SOUR>TITL, P.BURI.DATE, P.BURI.PLAC, P.BURI.NOTE1, P.BURI.NOTE2, P.BURI.SOUR>TITL, PF.REFN, PM.REFN, S.REFN, P_S.MARR.DATE, P_S.MARR.PLAC, P_S.MARR.ADDR1, P_S.MARR.Address2, P_S.MARR.SOUR>TITL, P_S.MARR.AGE:P, P_S.MARR.AGE:S, P.OCCU, P.OCCU.DATE, P.OCCU.PLAC, P.OCCU.SOUR>TITL, P.OCCU.SOUR.PAGE, P.OCCU.SOUR.TEXT1, P.OCCU.SOUR.TEXT2, P.OCCU.SOUR.DATE, P.OCCU.SOUR.NOTE, P.NOTE1, P.NOTE2, P.NAME.SOUR>TITL, P.NAME.SOUR.PAGE, P.NAME.SOUR.TEXT, P.NAME.SOUR.DATE, P.NAME.SOUR.NOTE1, P.NAME.SOUR.NOTE2, P.SOUR>TITL, P.SOUR.PAGE, P.SOUR.TEXT, P.SOUR.DATE, P.SOUR.QUAY, P.SOUR.NOTE1, P.SOUR.NOTE2, P.CENS.DATE, P.CENS.PLAC, P.CENS.ADDR, P.CENS.SOUR>TITL

These are only examples of input data columns. In the sample data, the Primary REFN = 1 and 2 are repeated to include additional facts.

Running the Plugin
Directions and Settings

Directions and Settings

When this dialogue appears, choose your import CSV file at Step 1 then proceed to Step 2 and Step 3.

If no source record is required and thus no citations, change the Add a source citation to every item imported option to No.

If an existing source record for citations is not desirable, set the Select existing source record for such citations option to No.

There is a choice of separator for multiple note and other text fields, e.g. comma, semicolon, space, newline or two newlines.

Normally each REFN value in the CSV file will be retained in the Custom ID field. If this is not required change the setting to No.

Normally each REFN value is synchronized with Custom ID fields in the Project. If this is not required change the setting to No.
This lets multiple CSV files import extra details for existing people. It also synchronizes facts so extra fields & citations can be added.
To sync with an established Project, use the FH Tools > Record Identifiers… option to copy Record Id to the Custom ID and use those numbers as the REFN values.

Alternatively, set the Treat any unsynced integer REFN as a Record ID option to Yes to synchronize REFN values with Record Id if the record has no Custom ID.

Checking the Mapping

Field Interpretation

The Field Interpretation tab of the plugin is important.

Check this before running the actual Import to your Project at Step 4 of the Directions and Settings tab.

If anything cannot be mapped in the process it appears red in the first column and can be corrected either in the input file and re-imported or modified in the mapping table.

The field interpretation can also be used to re-map specific columns to alternative tags, e.g. birth to christening by selecting the dropdown arrows in the field label and detail label columns.

It is recommended that the data is “cleaned” ready for import in your spreadsheet before creating your CSV file rather than correcting mappings each time during import.
For example:

    1. Ensure Date formats look like (d)d/MMM/yyyy and use “text” format for the cells so the spreadsheet does not try to format the dates.
    2. Enter Notes into a suitable column and delete trailing spaces and special characters with the formula =TRIM(CLEAR(cell-reference)).
    3. Ensure Place names use a consistent format such as City, County, State, Country in a single cell.

The More Information tab provides a summary of the above advice.

More Information


Plugin Flexible CSV Importer

Help content on this page is owned and provided by Mike Tate, the plugin's author, Calico Pie takes no responsibility for its content.